Picture for Yunhao Tang

Yunhao Tang

The Llama 4 Herd: Architecture, Training, Evaluation, and Deployment Notes

Add code
Jan 15, 2026
Viaarxiv icon

Ministral 3

Add code
Jan 13, 2026
Viaarxiv icon

Voxtral

Add code
Jul 17, 2025
Viaarxiv icon

Magistral

Add code
Jun 12, 2025
Figure 1 for Magistral
Figure 2 for Magistral
Figure 3 for Magistral
Figure 4 for Magistral
Viaarxiv icon

On a few pitfalls in KL divergence gradient estimation for RL

Add code
Jun 11, 2025
Viaarxiv icon

LlamaRL: A Distributed Asynchronous Reinforcement Learning Framework for Efficient Large-scale LLM Trainin

Add code
May 29, 2025
Viaarxiv icon

Learning to chain-of-thought with Jensen's evidence lower bound

Add code
Mar 25, 2025
Viaarxiv icon

RL-finetuning LLMs from on- and off-policy data with a single algorithm

Add code
Mar 25, 2025
Viaarxiv icon

Optimizing Language Models for Inference Time Objectives using Reinforcement Learning

Add code
Mar 25, 2025
Viaarxiv icon

Soft Policy Optimization: Online Off-Policy RL for Sequence Models

Add code
Mar 07, 2025
Figure 1 for Soft Policy Optimization: Online Off-Policy RL for Sequence Models
Figure 2 for Soft Policy Optimization: Online Off-Policy RL for Sequence Models
Viaarxiv icon